Introduction

Ascochyta rabiei is a necrotrophic fungus, causing the Ascochyta blight disease on chickpea, a disease that can cause substantial damage to crops and reduce yield and profits. The disease poses a major risk to the chickpea industry in Australia and worldwide and therefore significant efforts are invested to better understand and monitor the population of A. rabiei and identify the most aggressive isolates that are then used to screen breeding material to develop more resistant (or less susceptible) varieties.

The aggressiveness level of A. rabiei is determined by assessing the disease symptoms on a differential set of host chickpea varieties/genotypes, in a process that is termed “phenotyping”. The Plant Pathology team at the Sustainable Food Production group at Griffith University (led by Prof Rebecca Ford and Dr Ido Bar) are preforming those assessments annually on a collection of A. rabiei isolates collected from the chickpea growing regions in Australia each year. The assessment is performed on 5 weeks old chickpea seedlings that have been inoculated with A. rabiei isolates and grown in controlled environment conditions that are conducive for the fungus to establish itself and cause disease on the plants. The exact phenotyping methods are described in (Mehmood et al. 2017; Sambasivam et al. 2020).

Aims

  • Review machine-learning methods and approaches that can be applied to automatically assess disease symptoms from captured images
  • Apply and optimise selected methods and assess their accuracy, efficiency and correlation with traditional plant pathology visual assessment method

Methods

Experimental Design

Disease assessment bioassays were managed and performed by a trained plant pathologist (Melody Christie). Each bioassay included 6 A. rabiei isolates which were used to inoculate 5 differential chickpea host genotypes. The assays were performed in a blinded replicated design, where each isolate-host combination was replicated in 2 pots, each containing 5 chickpea seedlings. The seedlings were inoculated at 3 weeks after germination and were assessed 2 weeks post inoculation.

The assessment was then performed for each plant in each pot and the leaf and stem disease scores were assessed (on a qualitative ordinal 1-9 scale), as well as the percentage of the diseased area (Mehmood et al. 2017; Sambasivam et al. 2020). The disease assessment scores were recorded on a tablet using an online form. An example of a pot with chickpea seedlings infected with Ascochyta blight prior to assessment can be seen in Figure 1 below.

knitr::include_graphics("input_images/infected_plant_in pot.jpg")
Chickpea seedlings affected by *Ascochyta rabiei*

Figure 1: Chickpea seedlings affected by Ascochyta rabiei

Image Capture

An imaging light box (Neewer Upgraded Photo Light Box, 40x40x40 cm) with a camera tripod (Neewer Camera Tripod Monopod with Rotatable Center Column) and a Sony digital camera (Sony A7III) that was mounted and operated remotely using Sony’s Imaging Edge software, were used to capture images of the diseased plants (see setup in Figure 2). Each plant was laid flat in the imaging box and captured by at least 2 images to represent both sides prior to the visual pathology assessment.

knitr::include_graphics("input_images/image_capture_setup.jpg")
Image capture setup for disease assessment of *Ascochyta rabiei*

Figure 2: Image capture setup for disease assessment of Ascochyta rabiei

Image Processing for Disease Assessment

Tools and Packages

Image analysis and data processing and visualisation were performed in the R version 4.1.0 (2021-05-18) statistical computing environment (Ihaka and Gentleman 1996; R Core Team 2017). Specifically, various packages from the tidyverse collection were used for general data import, wrangling, summary and visualisation and janitor was used for data cleansing and sanitising (Wickham et al. 2019; Firke 2021). The magick and pliman R packages were used for image analysis and automatic disease assessment (Ooms 2021; Olivoto 2022). The furrr package was used to process the images in parallel along with tictoc to measure and evaluate processing times (Izrailev 2021; Vaughan and Dancho 2022).

# load custom functions from Ido's gist
devtools::source_gist("7f63547158ecdbacf31b54a58af0d1cc", filename = "util.R")
# install/load packages in one-liner
library(pacman)
# load/install from GitHub repos
p_load_gh(char = c("TiagoOlivoto/pliman", 
                   "DavisVaughan/furrr", 
                   "HenrikBengtsson/progressr"))
# load/install from CRAN/BioConductor
p_load(tidyverse, magick, tictoc, Microsoft365R, janitor, ISOweek, pliman)

plan(multisession, workers = future::availableCores()-2 ) # automatically detect available cores

Image Processing

The steps to perform the processing of the images and disease assessment followed the Phytopatometry tutorial of pliman.
First, representative palettes of the image background, healthy plant tissue and diseased (symptomatic) plant tissue were sampled from the images and defined (Figure 3).

h <- image_import("./palette/h.png")
s <- image_import("./palette/s.png")
b <- image_import("./palette/b.JPEG")
test_plant <- image_import("input_images/test4.JPG")
pliman::image_combine(test_plant, h, s, b)
Colour palettes representing background (b) colour, healthy (h) tissue and symptomatic (s) tissue

Figure 3: Colour palettes representing background (b) colour, healthy (h) tissue and symptomatic (s) tissue

A custom function (process_image_pliman(), which can be found in the src/functions.R file in this repository) was then written to process the images with the following steps:

  1. Cropping – removal of the side walls of the light box and any uneven background
  2. Background removal – set background to transparent (optional step)
  3. Disease assessment using pliman
  4. Return results as a table with the image-based assessment for each plant in each pot

The function takes as arguments the input file and cropping parameters (pixels to trim from each side of the image), as well as reference colour and fuzz value to match when removing background (if chosen), an optional flag to save the cropped or no-background images and a flag to determine if the disease assessment should be performed by pliman. This provides the flexibility to perform just part of the processing each time (just crop images, assess disease on pre-cropped images, etc.). An example of the outputs of each of the image processing steps can be seen in Figure 4.

To assist with determining the reference colour for the background removal, another function was written, that averages the RGB colour from a range of defined pixels (avg_bgcolor()). This function is used internally inside process_image_pliman() if the user provides a valid Magick Geometry string instead of a Hex colour for the reference option.

# source image processing functions
source("src/functions.R")
# run the function for 1 image (removing background, cropping and assessing)
process_image_pliman(image_file = "./input_images/test4.JPG", 
                     out_folder = "output/processed_images", 
                     assess_disease = TRUE,
                     crop = TRUE, 
                     save_cropped = FALSE, # should save the cropped file even if FALSE!
                     trans = TRUE,
                     h_pal = h, s_pal = s, b_pal = b)
cropped_plant <- image_import("output/processed_images/test4_cropped.jpg")
trans_plant <- image_import("output/processed_images/test4_cropped_transparent.png")
pliman::image_combine(test_plant, cropped_plant, trans_plant)
Example of the cropping and background removal performed during image processing

Figure 4: Example of the cropping and background removal performed during image processing

After the image processing was validated for one image in a bioassay folder, the rest of the images in the folder were processed in parallel.

# list of image files in the bioassay folder
bioassay_test <- list.files("../../Image Capture/20220429_Phenotyping", "B001_POT13_.+.JPG", full.names = TRUE)

bioassay_files <- tibble(original_file = list.files("../../Image Capture/20220429_Phenotyping", "B001_POT.+.JPG", full.names = TRUE)) %>% 
  mutate(meta_string=tools::file_path_sans_ext(basename(original_file))) %>% 
  separate(meta_string, into = c("bioassay", "pot", "plant", "replicate")) %>% 
  mutate(pot=as.integer(sub("POT", "", pot, ignore.case = TRUE)), plant=sub("PL", "", plant, ignore.case = TRUE))

step_size <- 20

steps <- seq(1,nrow(bioassay_files), by = step_size)
combined_disease_assessment_results <- vector("list", length = length(steps))
for(i in steps){
  # i =421 
  
  next_step <- min(c(i+step_size-1, nrow(bioassay_files)))
  LogMsg(glue::glue("Processing images {i} to {next_step}, please be patient..."))
   # %>% process_image_pliman()
  tic() # start timer
  with_progress({
    p <- progressor(steps = length(bioassay_files$original_file[i:next_step])) # 
    combined_disease_assessment_results[[which(steps==i)]] <- bioassay_files$original_file[i:next_step] %>% 
      future_map_dfr(.f = ~{
        res_tibble <- process_image_pliman(image_file = .x, 
                                           out_folder = "../../Image Capture/20220429_Phenotyping/processed_images", 
                                           assess_disease = TRUE,
                                           crop = TRUE, 
                                           trim_bottom=350, trim_top=0,
                                           trim_left=0, trim_right=230,
                                           save_cropped = TRUE,
                                           trans = TRUE,
                                           set_fuzz = 18,
                                           reference="#DAD1C8",
                                           start_point = "+5400+3500",
                                           h_pal = h, s_pal = s, b_pal = b)
        p() # because we put this last in the future_map_dfr, the function returned the progressor step instead of the tibble!
        return(res_tibble) # this should fix it...
      }, .options = furrr_options(seed = TRUE)) # this removes the annoying warning about the seed (though I don't think we're generating any random numbers)
  })
  toc() 
  
}

Evaluation of Image-based Assessments

Data Collection

After processing the entire images of a bioassay, the images metadata (filename, bioassay, pot, plant) and disease assessment results were combined to a single data table (tibble) and summarised per plant (average of symptomatic area of all images).
The visual pathology assessment data was downloaded directly from the SharePoint site capturing the online form submissions using Microsoft365R R package (Ooi 2021), combined with the image-based disease assessment data and saved to a .csv file.

# read pathology data 
discard_cols <- c("ID", "Start time", "Completion time", "Email", "Name", "Comments", "Images")
phenotyping_data <- readxl::read_excel("bioassay_test/Ascochyta_rabiei_phenotyping_form_data.xlsx") %>% 
  mutate(date = as.Date(`Start time`), week = ISOweek(date)) %>% 
  select(-one_of(discard_cols)) %>% 
  pivot_longer(contains("Plant"), names_to = "Trait", values_to = "Score") %>% 
  mutate(plant = sub(pattern= "Plant ", "", x = str_extract(Trait, "Plant \\d")), 
         Trait = sub(pattern= "\\s*Plant \\d\\s*", "", Trait),
         Score = as.numeric(Score)) %>%
  clean_names() %>% 
  group_by(bioassay_id, date, pot_number, trait, plant) %>% 
  summarise(score = mean(score, na.rm = TRUE)) %>% 
  pivot_wider(names_from = trait, values_from = score) %>% 
  rename(bioassay = bioassay_id, pot=pot_number)

# combine results ####
# process disease assessment results 
combined_results_df <- map_dfr(combined_disease_assessment_results, bind_rows) %>% # check results and extract metadata from filename
  mutate(meta_string=tools::file_path_sans_ext(basename(original_file))) %>% 
  separate(meta_string, into = c("bioassay", "pot", "plant", "replicate")) %>% 
  mutate(pot=sub("POT", "", pot, ignore.case = TRUE), plant=sub("PL", "", plant, ignore.case = TRUE)) %>% 
  group_by(bioassay, pot, plant) %>% 
  summarise(symptomatic_mean = mean(symptomatic, na.rm = TRUE), 
            assessment_SD=sd(symptomatic, na.rm = TRUE), 
            image_num=n()) %>% 
  left_join(phenotyping_data) %>% relocate(date, .before = 1) %>% 
  write_csv("output/0220429_Phenotyping_processed_data.csv")

This processed was then repeated for all the captured images in the bioassay folders and the data was combined and merged with the bioassay design spreadsheet to assign an isolate-host combination for each pot in each bioassay.

Statistical Analysis and Visualisation

The correlation between the visual phenotyping disease (leaf and stem 1-9 score and area of diseased tissue) and the image-based diseased symptoms assessment was visualised and assessed using the R packages ggstatsplot, GGally, ggcorrplot, as described in Correlation coefficient and correlation test in R (Antoine Soetewey 2020).

# load/install from CRAN/BioConductor
p_load(ggstatsplot, GGally, ggcorrplot, scales)

Results

The images from each phenotyping bioassay were processed as detailed above and summarised per pot and presented in Table 1 below.

# Read processed image data ####
results_files <- list.files("output", pattern = "._Phenotyping_processed_data.csv",
                            full.names = TRUE)

processed_data <- map_dfr(results_files, ~ read_csv(.x) %>% mutate(date=as.Date(date))) %>% 
  clean_names() %>% mutate(across(c("pot", "plant", "image_num"), as.integer))
# summarise data
summarised_data <- processed_data %>% group_by(bioassay, pot) %>% 
  summarise(symptomatic_SD=sd(symptomatic_mean, na.rm=TRUE),
            symptomatic_mean=mean(symptomatic_mean, na.rm=TRUE), 
            lad_percent_mean=mean(lad_percent, na.rm=TRUE), 
            lad_percent_SD=sd(lad_percent, na.rm=TRUE),
            leaf_score_mean=mean(leaf_score, na.rm=TRUE), 
            leaf_score_SD=sd(leaf_score, na.rm=TRUE),
            stem_score_mean=mean(stem_score, na.rm=TRUE), 
            stem_score_SD=sd(stem_score, na.rm=TRUE),
            plant_num = n(),
            image_num = sum(image_num)) %>% 
  relocate(symptomatic_SD, .after = symptomatic_mean) %>% 
  ungroup() %>% 
  arrange(bioassay, pot) 
# prepare table for report
table_data <- summarised_data %>% select(-ends_with("SD")) %>% 
  mutate(across(any_of(c("symptomatic_mean", "lad_percent_mean")), 
                .fns = ~./100),
         across(where(is.double), .fns = ~round(., digits = 2) )) %>% 
  set_names(snakecase::to_title_case(names(.), abbreviations = "LAD"))
Table 1: Disease assessment comparison table

The correlation between the visual pathology and image-based disease assessments were plotted and assessed. The pairwise correlation between all measured variables is seen in Figures 5,6.

# Check correlation between all traits 
cor_data <- summarised_data %>% select(symptomatic_mean, lad_percent_mean, 
                                       leaf_score_mean, stem_score_mean) %>% 
  setNames(snakecase::to_any_case(names(.), case = "title", abbreviations = "LAD"))
ggpairs(cor_data)
Pairwise correlations between image-based and visual pathology disease assessment methods for _A. rabiei_

Figure 5: Pairwise correlations between image-based and visual pathology disease assessment methods for A. rabiei

# Check correlation between all traits 
# plot correlation matrix
ggcorrmat(
  data = cor_data,
  type = "parametric", # parametric for Pearson, nonparametric for Spearman's correlation
  colors = c("darkred", "white", "steelblue") # change default colors
)
Correlations matrix of image-based and visual pathology disease assessment methods for _A. rabiei_

Figure 6: Correlations matrix of image-based and visual pathology disease assessment methods for A. rabiei

A more focused view of the correlation between the visual pathology assessment and the image-based assessment performed by ggstatsplot can be seen in Figure 7.

pale_theme <- ggthemr(palette = "pale", set_theme = FALSE, text_size = 14)
# ggplot(processed_data, aes(x=symptomatic_mean , y=lad_percent))
ggscatterstats(
  data = summarised_data,
  x = symptomatic_mean,
  y = lad_percent_mean,
  bf.message = FALSE
) + pale_theme$theme +
  # geom_errorbar(mapping = aes(ymin = lad_percent_mean - lad_percent_SD, 
  #                             ymax = lad_percent_mean + lad_percent_SD)) +
  # geom_errorbarh(mapping = aes(xmin = symptomatic_mean - symptomatic_SD, 
  #                             xmax = symptomatic_mean + symptomatic_SD)) +
  labs(y = "LAD (%)", x = "Symptomatic area (%)")
Correlation between image-based and visual pathology disease assessment methods for _A. rabiei_

Figure 7: Correlation between image-based and visual pathology disease assessment methods for A. rabiei

Conclusion

Overall 681 images of diseased plants were processed, representing 294 plants (70 pots of 35 isolate-host combinations). We observed moderate-high correlation (\(R^2=0.65\)) between the image-based disease assessment and the visual pathology assessment. The correlation between the qualitative stem and leaf disease scores and visual assessment of diseased leaf area (LAD %) was significantly better than the image-based computed assessment, indicating that there are still further improvements to be achieved to increase the accuracy of the model. However, the results are promising and demonstrate the advantages of image-based assessments, which are faster, do not require expert pathologists (therefore cheaper) and are not subjective, hence less biased by the individual assessor’s skills, knowledge, mental state, fatigue, etc.

General information

This document was last updated at 2022-06-14 17:08:20 using R Markdown (built with R version 4.1.0 (2021-05-18)). Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. It is especially powerful at authoring dynamic documents and reports which can showcase and execute code chunks and use the results in the output. For more details on using R Markdown see http://rmarkdown.rstudio.com and Rmarkdown cheatsheet.


Bibliography

Antoine Soetewey (2020) Correlation coefficient and correlation test in R. Stats and R
Firke S (2021) Janitor: Simple tools for examining and cleaning dirty data.
Ihaka R, Gentleman R (1996) R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5:299–314. doi: 10.1080/10618600.1996.10474713
Izrailev S (2021) Tictoc: Functions for timing r scripts, as well as implementations of stack and list structures.
Mehmood Y, Sambasivam P, Kaur S, et al. (2017) Evidence and Consequence of a Highly Adapted Clonal Haplotype within the Australian Ascochyta rabiei Population. Frontiers in Plant Science 8:
Olivoto T (2022) Lights, camera, pliman! An R package for plant image analysis. Methods in Ecology and Evolution 13:789–798. doi: 10.1111/2041-210X.13803
Ooi H (2021) Microsoft365R: Interface to the ’microsoft 365’ suite of cloud services.
Ooms J (2021) Magick: Advanced graphics and image-processing in r.
R Core Team (2017) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Sambasivam P, Mehmood Y, Bar I, et al. (2020) Evidence of recent increased pathogenicity within the Australian Ascochyta rabiei population. bioRxiv 2020.06.28.175653. doi: 10.1101/2020.06.28.175653
Vaughan D, Dancho M (2022) Furrr: Apply mapping functions in parallel using futures.
Wickham H, Averick M, Bryan J, et al. (2019) Welcome to the tidyverse. Journal of Open Source Software 4:1686. doi: 10.21105/joss.01686